-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add anisotropic coefficient of variation #143
base: main
Are you sure you want to change the base?
Conversation
Excellent! I will look at this more closely when I get back from the UK at the end of the week.
Previously, we privileged column-major/'F' order arrays because that's how MATLAB stores arrays, and it was easier. Since we switched to using the The order of the loops is determined by performance. You should generally always loop over It should be possible to swap the rows and columns of the filter kernels. For example, your float C_filter_1[5][5] = {{1, 0, 1, 0, 1},
{0, 0, 0, 0, 0},
{1, 0, 0, 0, -1},
{0, 0, 0, 0, 0},
{-1, 0, -1, 0, -1}};
float F_filter_1[5][5] = {{1, 0, 1, 0, -1},
{0, 0, 0, 0, 0},
{1, 0, 0, 0, -1},
{0, 0, 0, 0, 0},
{1, 0, -1, 0, -1}}; But I can see how it quickly gets confusing to keep track of the indices. I'll need to think about whether we really need to change the loop order.
Have you compared these to the MATLAB output? As I said I can add new snapshots to the snapshot testing repository. I got a bit distracted trying to automate the snapshot testing process last week, but I can add those for you. |
@Teschl: I think this generally looks all right. I added snapshots of the
|
@wkearn I reworked the function to use col-major order. I haven't really tested this properly jet, so it might be that matching the kernel values to the DEM values is not working 100% correctly at this time. Instead of using two for loops to loop over the matrix, I switched to a single for loop approach. It would be nice if you could give this a once over to see if this makes sense to you. I also flattened the kernels in col major order but left the old row major order to better visualize how they look. I also left the old code as a comment for now. What I'm still working on:
|
How do we want to handle NaNs in this function @wkearn ? As far as I know, it's possible to do in the C function (using Also, the workflow is currently failing because:
And I don't really understand what I can do to fix this. |
I think we need to add a
Did you figure out how to handle the boundary conditions similarly to the original implementation? The NaNs are handled similarly to the boundaries in that they are set to the nearest non-NaN neighbor. We might be able to adapt that to handle NaNs in the libtopotoolbox implementation. As I said before, I don't think we actually need to fill every NaN in the DEM using something like One question is which pixel to replicate if we have something like
We should look at MATLAB's bwdist and see which pixel it chooses to be consistent with the previous behavior. If this proves to be too complicated, we might just assume that the NaNs are filled before the arrays are passed to C and document that somewhere. |
Merge or rebase #148 when you are ready, and the builds should run on Ubuntu. |
I have been testing the bwdist function in Matlab for a bit and came to the following conclusion: When looking at cell 5, if there are multiple nearest cells, it will choose the one with the lowest index. So 2 will beat 4, 6, 8 for example.
Makes sense. I can also hard-code the search order, since the cell that the kernel is applied to (the center of the kernel) will have to have a value. |
@Teschl: how is this going? I like what you came up with for handling the NaN/boundary conditions in the last couple of commits. I think there are a few minor issues, which you might be aware of: for example, More importantly, though, have you been testing the output of I added Below is a patch to I also pushed this test to my diff --git a/test/snapshot.cpp b/test/snapshot.cpp
index b4b7cbf..11c0445 100644
--- a/test/snapshot.cpp
+++ b/test/snapshot.cpp
@@ -40,7 +40,29 @@ void load_data_from_file(const std::string& filename, std::vector<T>& output,
GDALClose(dataset);
}
+template <typename T, GDALDataType S>
+void write_data_to_file(const std::string& dst_filename,
+ const std::string& src_filename, std::vector<T>& output,
+ std::array<ptrdiff_t, 2>& dims) {
+ GDALDataset* src_ds = GDALDataset::Open(src_filename.data(), GA_ReadOnly);
+ assert(src_ds);
+
+ GDALDriver* driver = src_ds->GetDriver();
+ GDALDataset* dst_ds =
+ driver->CreateCopy(dst_filename.data(), src_ds, FALSE, NULL, NULL, NULL);
+ assert(dst_ds);
+
+ GDALRasterBand* poBand = dst_ds->GetRasterBand(1);
+ assert(poBand->RasterIO(GF_Write, 0, 0, dims[0], dims[1], output.data(),
+ dims[0], dims[1], S, 0, 0) == CE_None);
+
+ GDALClose(dst_ds);
+ GDALClose(src_ds);
+}
+
struct SnapshotData {
+ std::filesystem::path path;
+
std::array<ptrdiff_t, 2> dims;
float cellsize;
@@ -48,17 +70,21 @@ struct SnapshotData {
std::vector<float> filled_dem;
std::vector<int32_t> flats;
std::vector<int32_t> sills;
+ std::vector<float> acv;
// Output arrays
std::vector<float> test_dem;
std::vector<float> test_filled_dem;
std::vector<int32_t> test_flats;
std::vector<int32_t> test_sills;
+ std::vector<float> test_acv;
// Intermediate arrays
std::vector<uint8_t> bc;
SnapshotData(const std::filesystem::path& snapshot_path) {
+ path = snapshot_path;
+
GDALAllRegister();
if (exists(snapshot_path / "dem.tif")) {
@@ -87,6 +113,12 @@ struct SnapshotData {
assert(dims[0] == dims_check[0] && dims[1] == dims_check[1]);
}
+ if (exists(snapshot_path / "acv.tif")) {
+ load_data_from_file<float, GDT_Float32>(snapshot_path / "acv.tif", acv,
+ dims_check);
+ assert(dims[0] == dims_check[0] && dims[1] == dims_check[1]);
+ }
+
// Allocate and resize output and intermediate arrays
if (dem.size() > 0) {
if (filled_dem.size() > 0) {
@@ -97,59 +129,127 @@ struct SnapshotData {
if (flats.size() > 0 && sills.size() > 0) {
test_flats.resize(dims[0] * dims[1]);
}
+ if (acv.size() > 0) {
+ test_acv.resize(dims[0] * dims[1]);
+ }
}
}
- void runtests() {
- // fillsinks
- if (test_filled_dem.size() > 0) {
- // Initialize bcs
- for (ptrdiff_t j = 0; j < dims[1]; j++) {
- for (ptrdiff_t i = 0; i < dims[0]; i++) {
- if (isnan(dem[j * dims[0] + i])) {
+ int run_fillsinks() {
+ // Initialize bcs
+ for (ptrdiff_t j = 0; j < dims[1]; j++) {
+ for (ptrdiff_t i = 0; i < dims[0]; i++) {
+ if (isnan(dem[j * dims[0] + i])) {
+ bc[j * dims[0] + i] = 1;
+ test_dem[j * dims[0] + i] = -INFINITY;
+ } else {
+ if (i == 0 || i == dims[0] - 1 || j == 0 || j == dims[1] - 1) {
+ // 1 on the boundaries
bc[j * dims[0] + i] = 1;
- test_dem[j * dims[0] + i] = -INFINITY;
} else {
- if (i == 0 || i == dims[0] - 1 || j == 0 || j == dims[1] - 1) {
- // 1 on the boundaries
- bc[j * dims[0] + i] = 1;
- } else {
- // 0 on the interior
- bc[j * dims[0] + i] = 0;
- }
+ // 0 on the interior
+ bc[j * dims[0] + i] = 0;
}
}
}
+ }
- tt::fillsinks(test_filled_dem.data(), test_dem.data(), bc.data(),
- dims.data());
+ tt::fillsinks(test_filled_dem.data(), test_dem.data(), bc.data(),
+ dims.data());
- for (ptrdiff_t j = 0; j < dims[1]; j++) {
- for (ptrdiff_t i = 0; i < dims[0]; i++) {
- if (!isnan(filled_dem[j * dims[0] + i])) {
- assert(test_filled_dem[j * dims[0] + i] ==
- filled_dem[j * dims[0] + i]);
+ for (ptrdiff_t j = 0; j < dims[1]; j++) {
+ for (ptrdiff_t i = 0; i < dims[0]; i++) {
+ if (!isnan(filled_dem[j * dims[0] + i])) {
+ if (test_filled_dem[j * dims[0] + i] != filled_dem[j * dims[0] + i]) {
+ write_data_to_file<float, GDT_Float32>(path / "test_fillsinks.tif",
+ path / "fillsinks.tif",
+ test_filled_dem, dims);
+ return -1;
}
}
}
}
- if (flats.size() > 0 && sills.size() > 0) {
- // identifyflats
- //
- // Use the snapshot filled DEM rather than the generated one in
- // case fillsinks fails.
- tt::identifyflats(test_flats.data(), filled_dem.data(), dims.data());
-
- for (ptrdiff_t j = 0; j < dims[1]; j++) {
- for (ptrdiff_t i = 0; i < dims[0]; i++) {
- if (flats[j * dims[0] + i] == 1)
- assert(test_flats[j * dims[0] + i] & 1);
-
- if (sills[j * dims[0] + i] == 1)
- assert(test_flats[j * dims[0] + i] & 2);
+ return 0;
+ }
+
+ int run_identifyflats() {
+ // identifyflats
+ //
+ // Use the snapshot filled DEM rather than the generated one in
+ // case fillsinks fails.
+ tt::identifyflats(test_flats.data(), filled_dem.data(), dims.data());
+
+ for (ptrdiff_t j = 0; j < dims[1]; j++) {
+ for (ptrdiff_t i = 0; i < dims[0]; i++) {
+ if (flats[j * dims[0] + i] == 1 && !(test_flats[j * dims[0] + i] & 1)) {
+ write_data_to_file<int32_t, GDT_Int32>(
+ path / "test_identifyflats.tif", path / "identifyflats_flats.tif",
+ test_flats, dims);
+ return -1;
+ }
+
+ if (sills[j * dims[0] + i] == 1 && !(test_flats[j * dims[0] + i] & 2)) {
+ write_data_to_file<int32_t, GDT_Int32>(
+ path / "test_identifyflats.tif", path / "identifyflats_sills.tif",
+ test_flats, dims);
+ return -1;
+ }
+ }
+ }
+ return 0;
+ }
+
+ int run_acv() {
+ // acv
+ tt::acv(test_acv.data(), dem.data(), 0, dims.data());
+
+ for (ptrdiff_t j = 0; j < dims[1]; j++) {
+ for (ptrdiff_t i = 0; i < dims[0]; i++) {
+ if (test_acv[j * dims[0] + i] != acv[j * dims[0] + i]) {
+ write_data_to_file<float, GDT_Float32>(
+ path / "test_acv.tif", path / "acv.tif", test_acv, dims);
+ return -1;
}
}
}
+
+ return 0;
+ }
+
+ int runtests() {
+ int result = 0;
+ // fillsinks
+ if (test_filled_dem.size() > 0) {
+ if (run_fillsinks() < 0) {
+ result = -1;
+
+ std::cout << "[FAILURE] (fillsinks) " << path << std::endl;
+ } else {
+ std::cout << "[SUCCESS] (fillsinks) " << path << std::endl;
+ }
+ }
+
+ if (flats.size() > 0 && sills.size() > 0) {
+ if (run_identifyflats() < 0) {
+ result = -1;
+
+ std::cout << "[FAILURE] (identifyflats) " << path << std::endl;
+ } else {
+ std::cout << "[SUCCESS] (identifyflats) " << path << std::endl;
+ }
+ }
+
+ if (acv.size() > 0) {
+ if (run_acv() < 0) {
+ result = -1;
+
+ std::cout << "[FAILURE] (acv) " << path << std::endl;
+ } else {
+ std::cout << "[SUCCESS] (acv) " << path << std::endl;
+ }
+ }
+
+ return result;
}
};
@@ -171,11 +271,13 @@ int main(int argc, char* argv[]) {
return 0;
}
+ int result = 0;
for (const auto& entry :
std::filesystem::directory_iterator(snapshot_dirpath)) {
- std::cout << entry.path().filename() << std::endl;
SnapshotData data(entry.path());
- data.runtests();
+ result |= data.runtests();
}
+
+ return result;
} |
Last week I ended with testing the current version and got seg-faults.
This Might already fix that. I will add the snapshot tests and then we can see what's still missing from this function. |
@wkearn: I added and ran the snapshot tests using the snapshot.cpp file of your branch. Running Is this path correct?
or should it be When running the function in python, it also completes without any issues, and the dem's contain no visible artifacts that point to something going wrong in a major way.
I have been using python with very small arrays (3x4) to individually test if each cell gets checked in the correct order and the correct neighbors are checked when applying the kernels. |
When you run
If you run
👍 |
CI does run the snapshot tests, but it is using the old snapshot data that does not have the debug-build:
runs-on: ubuntu-latest
env:
- snapshot_version: v1.3.0
+ snapshot_version: v1.4.0
steps:
- uses: actions/checkout@v4
with: This should allow it to run the |
It seems it's due to GDAL not beeing installed. I'm having some truble getting GDAL to work so I wasnt able to test it locally jet. |
You can always use the snapshot data to compare your results in Python, which might be easier than sorting out GDAL. We just don't have that set up automatically in pytopotoolbox yet. It looks like the acv snapshot tests fail on CI, but unfortunately there isn't a ton of visibility into the failure from the CI logs. I'll add an option to the CI that lets you download an artifact with all the snapshots in it, including the test output, which could be useful. One thing to note: if you find that your implementation is getting similar results to the snapshots that are off by a small amount, that is probably floating point errors. We might want to change the test condition from if (test_acv[j * dims[0] + i] != acv[j * dims[0] + i]) {
//...
} to something like if (fabsf(test_acv[j * dims[0] + i] - acv[j * dims[0] + i]) > 1e-4) {
// ...
} to account for possible floating point errors. I would say that any differences larger than 0.0001 or so are probably cause for further investigation, but this threshold is kind of arbitrary, so feel free to play around with it. |
If it is useful to get the snapshot output from CI, #152 should allow you to download a zip archive of the snapshot test output. Go to the summary page for a given run (e.g. https://github.com/TopoToolbox/libtopotoolbox/actions/runs/12046675226) and scroll down to the artifacts. |
I tested the snapshot results against the C implementation in Python like you suggested and something seems to be going wrong pretty consistently. The difference seems to be between 0.4 and 0.1 in all examples. So that's pretty bad especially considering it was better before. I'm investigating right now, but I don't think it's the NaN checks, because results are wrong all over. |
The northern slopes seem to perform better than southern ones. But both are far from 0.0001 precision. |
@wkearn apart from the issue that the precision is not good enough jet, there will probably be issues with the snapshot comparison. The NaN fields in the provided snapshots are different from the NaN areas in the DEM. So the NaN's in the snapshot data are not just one uniform field but have a few of different values. And depending on how the output array is provided, the NaNs in the output will be all zeros, for example. |
Right. We should probably change the snapshot test to only test valid (non-NaN) pixels. It is all right if the NaNs in the DEM end up with different output values in the libtopotoolbox or the MATLAB acv implementation. |
I have a hypothesis about what's happening. Unlike most of the other operations we have been working with, convolution is not agnostic to the row- or column-major layout of the array, because the filter coefficients are oriented in a particular way. When GDAL loads the data for the snapshot tests, it is stored with the x-axis (East-West) first, and the y-axis (North-South). Again, this doesn't usually matter because everything is usually symmetric: we just pass the x dimension first and the y dimension second and the function works as expected. However, this is opposite from how MATLAB's I think you got the right answers previously because you had assumed that the array was row-major, and the GDAL arrays are essentially row-major. I just ran the MATLAB implementation of ACV on Big Tujunga, but I transposed the So I think that's the problem. I'm actually not sure about the solution. We could just say "this array is column-major with columns representing the north-south axis and rows representing the east-west axis" and then fix the tests so that we load data in the right orientation. I don't really like that though. I'll have to think about it. |
I completed the
acv()
function. I kept part of the original MATLAB code in as comments to better show which parts are based on what code. Right now, the loops are written to support 'C' order arrays instead of the 'F' order we have been using before. This is mainly because the filters/kernels are, and accessing both for computation seemed easier this way.Is there a reason why we are using the 'F' array order for most functions? Is that convention when working with dem's/hydrology?
I tested this function by using the python toolbox and checking if each filter/kernel is working with the correct values. The following is the result when running it on the 'perfectworld' DEM.
closes #100